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O . Abstract 



The work relates to a new way for analysis of one-dimensional stochastic systems, 
based on consideration of its higher order difference structure. From this point of 
view, the deterministic and random processes are analyzed. A new numerical char- 
acteristic for one-dimensional stochastic systems is introduced. The applications to 
single neuron models and neural networks are given. 
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U : 1 Introduction 

This paper presents some applications of the difference analysis, has been suggested in 
authors' recent works [1-3]. The approach has been detected on computational study 
[1] of neural activity: we observed that sequences of higher order absolute differences, 
• taken from periodically stimulated neuron's spike train, contain long samples on which 

■ the changes in monotony (increase/decrease) are periodic. 
The next Section 2 generalizes this observation and introduces a new characteristic 

! for stochastic one-dimensional systems, expressing this type of non-explicit periodicity in 

^ ' a quantitative form. This notion, the numerical characteristic 7, being a measure of some 

Q ' minimal periodicity, can also be treated as a new measure of irregularity. On instances of 

some nonlinear maps its numerical comparisons with Lyapunov exponent are given. 

The difference approach provides a strict algorithmic base for detection of small chaotic 
fluctuations at unstable bifurcation points. This is given for logistic map, however the 
I same analysis can be applied for various systems, exhibiting the bifurcations. 

^ ■ The probabilistic systems permit rigorous study - Section 3 considers discrete random 

■ processes. The sequences of independent random variables as well as binary Markov 
processes are examined. In fact, we deal with a new type of limiting transition, taken 
over the discrete random process. The applications to diffusion and Poisson processes 
are given. We establish several theoretical results relating to 7-characteristic of random 
discrete systems. We are based on Eggleston theorem [7] from ergodic theory to describe 
the attractors for these random processes. 

Section 4 considers stochastic models of single neuron as well as establishes some 
probabilistic neuron's firing condition. We are again based on the difference analysis and 
use a modified version of entropy. We claim that some new type of attractors, resulting 
in the approach suggested, can be treated as a new extended memory in neural networks. 
This appears to be well consistent with the brain theory concepts of attractor computing 
and associative hierarchical memory [8]. Finally, we prove a theorem, showing that if 
follow the difference approach, the noise in fact can be eliminated from arbitrary "noisy" 
neural network. 
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The paper is organized in such a way, that the most theoretical resuhs can be easily 
deduced from the previous ones. The complete proofs of the other theoretical statements 
will appear in the version of this work submitted to publication to a regular journal. 

2 Deterministic processes 

2.1 Finite differences and conjugate orbits 

The difference approach, suggested in [1-3], reduces the study of stochastic properties 
of the orbits X — {xi)^i, generated by given one- dimensional system, to analysis of 
alternations of the monotone increase or decrease of higher order absolute differences 

A^'^Xi = I A(^-i)xi+i - A^'-'^Xi\ (A(°)xi ^Xi; i, s = 1, 2, 3, . . .) . 

In this section we describe some statements of the approach and explain some basic notions 
involved in this work. 

Let us have a one-dimensional stochastic system, generating numerical sequences X — 
< < 1. It is not difficult to see, that for 1 < s < /c — 1 we have 

A(-^)x, = + A(^)x, - min (^(-1)^^^ A^^)^,) (1) 

p=l p=l 

where 

'^^^ { 1 AW^I, < Ai^hl = mm{A(^)x. : 1 < ^ < k - s} , 

(it is assumed J2i = 0)- Using the recurrent formula (1) we transform the finite orbit 
into some special form, which emphasizes its higher order difference structure. Namely, 
we transform X^ = {^i)i=i iiito the sequence ^k, 

Ck = (Afc, fik, Pk) (2) 

where 

Afc = (Ai, A2, . . . , Afc), A, = Q.S'f^S'i^ Si'l^ = 0, 1, . . . , m) (3) 

fit = {f^k,l, f^k,2, ■ ■ ■ ,l^k,m), Pk = {Pk,l, Pk,2, ■ ■ ■ , Pk,k-m) {Pk,i = A-™"*) . (4) 

The Eq. (2) in fact represents the original orbit in a different form - one can see that 
applying the recurrent procedure (1), the sequence X^ can be completely recovered by 
(k- The difference method suggests to study the sequences P = ij^k)T=i which terms are 
defined as follows: 

= O.sPsi'^Si'^ ... (5) 

- it is clear that 

Wk - h\ < 2-' (A; > 1) . (6) 



The v is called the conjugate (to X) orbit. The approach distinguishes two cases - 
continuous, when the quantities p^^j from (4) can take arbitrary numerical values from 
interval (0, 1), and the discrete case, when they take only a finite number of values. In 
either case, given X we are interested in its higher order difference structure, that reflects 
the conjugate orbit v. 

The computations performed in [1-2] show that for many actual continuous-time sys- 
tems the quantities 

m 

\\Mf+m\^ (ii(xi,...,xjii = ($:^?)'/') (7) 

as well as (due to relations (2) and (6)) the distances ||Cfc — '^fcll converge to zero with 
the exponential rate. In contrary, the sequences Pjt from (5) mostly have oscillating 
character and are attracted either to an interval or to a thin set A. This means, that the 
dominant part of information, that carries the original time series X and which permits 
measurements, in fact is conveyed by its conjugate orbit v. For many irregular systems 
the set A is the same for different orbits determining by different initial states. Hence, 
the A appears to be some (conjugate) attractor for the system. Therefore, the main part 
of information, produced by system during its evolution, is contained in the attractor A^ 
which can be treated as a geometrical image of the whole produced information. 

The systems, generating some natural numbers belonging to a finite set, (e.g., as the 
outcomes in hazard games) should be classified to the discrete case. The conjugate orbits 
are constructed by the same way: for instance, \i X — {xi)°2,Q and Xi G {0, 1}, then the 
difference sequences X^'^^ — {x\''^)°Zq are again the binary sequences and the conjugate 
orbits consist of the terms 

oo 

n=l 

However, for this case we do not have the convergence of (7) to zero as for continuous-time 
systems. Instead, prescribing some probabilities to generated outcomes (i.e. considering 
the random sequences) we are able to compare the original and conjugate systems just 
through their analytical parameters (see Section 3 for details). The sequences u^, being 
considered on some infinite subsets of indices A C of the natural series N, are convergent 
to some compacts from (0, 1), which can be disjoint for different A. For the sequences 
of independent random variables and Markov chains these cluster sets permit analytical 
description. Moreover, the analysis suggested can be applied to arbitrary continuous-time 
stochastic systems, permitting approximation or interpolation by the discrete ones. E.g., 
Section 3 gives such an application to diffusion stochastic processes. As another example 
(does not considered in this work) one can refer to Nyquist-Shannon sampling theorem 
[8, 32] from data analysis, according to which the analogue signal with bounded power 
spectrum is completely determined through its values on some discrete set of equidistant 
points. 

2.2 A new characteristic for irregular systems 

One of the main claims relating to the approach described, which has been confirmed by 
preliminary computations ([3]), is the following: a periodic response of given stochastic 



o 



system on a weak periodic perturbation is localized in X^"^^ - its presence can be detected, 
when considering the higher order differences, taken from initial orbit X. For that purpose 
it should be studied the asymptotical (as N — oo) relative volume (the density in natural 
series) of the set of all those indices i, for which the changes of binary symbol occur, 

C^ = l-<^f^ (l<i<7V-l). (8) 

This leads to the following definition: For an orbit X — {xk)'^^Q of a given system we 
define 

7 = 7(^) = lim (0<7<1) (9) 

where j{X, N) denotes the total number of those indices 1 < i < — 1 for each of which 
the (8) holds; the existence of limit in (9) is prcassumed. 

For deterministic systems the theoretical study of properties of 7 is quite difficult. Two 
statements of this section. Theorem 1 and Corollary 1, are apparently the only available 
theoretical results. Some theoretical statements, relating to random binary sequences 
can be found in Section 3. On the other hand, the 7 has that important advantage, 
that is the simplicity of its computation. It can be easily implemented just over the 
(experimental) data X, without referring to the process generation law. By this reason, 
as for computation of 7 we need only the corresponding time series to be available, this 
new characteristic is well adapted for computational study of various applied problems. 

The numerical analysis shows (see [3]) that 7 has weak dependence on system's initial 
values and is able to distinguish the regular and chaotic motion. With this aim we have 
compared 7 with Lyapunov exponent A (see e.g., [12], Ch. 5 and [4], Ch. 7.2. b): for a 
given map F : (0, 1) —>■ (0, 1) it is defined as 

X = X{X)=hm ^Elnl^^l {x,+, = F{x,)) . (10) 
-'v-^oo N dxk 

It is known [4] that Lyapunov exponent of any integrable system is zero. In contrary, it 
follows from Theorem 1 below that there exist integrable (but aperiodic) systems with 
positive (and rational) 7 - it can be, e.g., the sequence of fractional parts. Some results on 
computations of 7-charactcristic as well as its comparisons with Lyapunov exponent can 
be found in [3]. Three simple deterministic systems - the tent map, logistic function, and 
Poincare displacement of Chirikov's standard map have been examined. The numerical 
results demonstrate a strong correlation of 7 with Lyapunov exponent (see [3], Figs. 1 
and 2). 

We emphasize another important feature of this quantity, has been derived from these 
preliminary computations: the 7-coefficient is that numerical characteristic, associated 
with a given stochastic system, which is able to change significantly its numerical value 
when the system undergoes a weak perturbation. This relates to stochastic resonance 
phenomena. Some works [24, 25, 26] claim that namely this resonance mechanism has 
the basic role in neural activity. 

We have proven two rigorous results relating to 7-characteristic of continuous deter- 
ministic systems - Theorem 1 and Corollary 1. The Corollary 1 follows from Theorem 
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2, giving some 'difference' analogy of Eggleston formula from the ergodic theory. If X is 
either constant or periodic, then we obviously have 

7(X,iV) = 7iV + 0(l) (iV^oo) (11) 

and the coefficient < 7 < 1 is rational. According to next theorem (the symbols {.} 
and [.] denote fractional and entire part of number) this remains valid also for sequences 
of fractional parts. These (conditionally periodic) sequences have an important role on 
studying the general integrable dynamical systems (see [34, 35]). 

Theorem 1 For the sequence X = {{an})'^^^ where < a < 1 is irrational, the next 
statements are true: (1) the conjugate to X orbit V — {i'n)'^=i is a periodic sequence; (2) if 
entire part of 1/a is of the form \l/a\ — 2^ — 1 (p > 1) then i/„ = for all large enough 
indices n. 

The second theoretical result on 7-characteristic is the Corollary 1, estabhshing for bi- 
nary systems an upper estimate for HausdorfF dimension of the attractors A (sec previous 
section 2.1). To formulate this estimate involving 7-characteristic and Shannon entropy 
function, we need some preliminary definitions and results. Let us consider the processes, 
generating the binary sequences 

X = (a;i,X2, . . . ,a;n, . . .), e {0, 1} ; (12) 

it is convenient to prescribe to such a sequence the number < x < 1 

00 

X = Q.X1X2 ...Xn-.. (= X) 2""a;„) . 

n=l 

We define Bk as the collection of all real numbers < x < 1 for which the sequences 
(12) contain only bounded (by a given number K) series with the same binary symbol. 

If all of the difference sequences x^^^ belong to Bk, the x is called [2] 13k sequence. Some 
necessary and sufficient conditions a sequence x to be a Pk sequence can be found in [2]. 
The Eggleston theorem ([7]) states that 

dim {{x e (0, 1) : lim - ^ = p}) = Hip) (13) 

where notation dim{E) stands for Hausdorff dimension of set E and 

H{x) = X log2 - + {1 - x) log2 ^ (0 < a; < 1) (14) 

X J- X 

is Shannon entropy function. This function can also be derived from theory of number 
partitions: if C{s, N) denotes the total number of compositions of number N into s parts, 

N ^ mi+m2-\ \-ms , (15) 

then since C{s,N) = C^l\, using Sterling formula for binomial coefficients, it can be 
easily obtained that 

H{x) = lim ^ logs C{xN, N) (0 < x < 1) . (16) 



Analogously, given > 1 we define 

Hk(x) = lim 4 log2 Ck{xN, N) {0 < x < 1) (17) 

N—^00 iV 

where Ck{s,N) denotes ([9]; [10], Ch. 4.2) the total number of compositions (15) satisfying 
the restriction rrii < K. To be correct in these definitions, one should counts x is rational 
- \i X = p/q. then in (16) and (17) is of the form N = qn and n — > oo. Then the 
function (16) (as well as the (17)) can be extended on the whole unit interval < a; < 1. 
Concerning the relation (16) and the procedure just used for defining the Shannon entropy, 
see also [36, 37]. 

The next Theorem 2 estimates the Hausdorff dimension of the sets 

1 " 

E{p) = {x e (0, 1) : lim - ^ \xi+i - Xi\ = p) (18) 

"■-»oo n -^-^ 

1 " 

Ek{p) ^ {x ^Bk- lim - V -Xi\^p} (19) 

i=l 

where < p < 1 and K > 1 are arbitrary. 
Theorem 2 The next inequalities 

dim {E{p)) < H{p), dim {Ek{p)) < Hk{p) (20) 

are true. 

The following result establishes some relationship between system's 7-characteristic and 
Hausdorff dimension of its conjugate attractor. 

Corollary 1 If a deterministic system generates the binary sequences (the binary (3k 
sequences), then for Hausdorff dimension of the attractor A we have 

dim{A) < H{n) {dim{A) < Hk{i)) (21) 

where 7 is the system's response characteristic defined by Eq. (9). 

The H in (21) is Shannon entropy function, having a simple analytic form (14). We 
do not have such a simple expression for the function H^. The generating function of 
numbers Ck{s,N) from (17), by means of which the Hk is defined, is well known ([10], 
Ch. 4.3): 

00 

^c^(5,iv)g^ = (g+g2 + ••• + g^)^ 

N=Q 

using this formula, as in [10], Ch. 4.3, it can be obtained 

Some asymptotic relations for this quantity can be found in [11] (see [10], Ch. 4, Com- 
ments; the work [11] remained unavailable to us). They can be used to derive the explicit 
analytic form of the function Hk{x). In respect of inequality (20) we note also the work 
[29], where the positive discrepancy of fractal measures from Shannon entropy is discussed. 



2.3 Conjugate orbits and bifurcation points 

The conjugate orbits in fact magnify the small fluctuations of the process. The fluctuations 
are usually occur at unstable branch points [5, 6]. The notion of conjugate orbit allows to 
determine that each branch point on the bifurcation diagram of the original system can 
be treated as the source of chaos for some 'shifted' conjugate system. 

The computations were made for logistic map T : x ^ rx{l — x). Namely, let 

bi < b2 < ■ ■ ■ <bk < ■ ■ ■ 

be the bifurcation points of the orbit X = of iterates Xn+i = T{xn) {n > 1), 

numbered in increasing order. It is known [15], that 

lim bk — b(x, — 3.5699 ... . 

Given A^^ tending to infinity we consider the shifted sequences Yk = {xn+Nk)'^=i- The claim 
is that if Nk goes to oo quickly enough, then the orbits P^, conjugate to Yk, demonstrate 
the chaotic behavior at points 6i, 62, ■ ■ ■ , fcfc, while they are identically zero for all the 
different values of control parameter r belonging to interval (0,6oo)- 

This is certain algorithmic formalism of the mentioned in Section 1 descriptive remarks 
from [5] and [6] relating to small fluctuations. The same analysis can be implemented 
for many other systems exhibiting the bifurcations. Thus, the bifurcation diagrams of 
Poincare displacement of Duffing equation ([13], Ch. 11.5), Rossler system ([41], p. 46), 
and forced magnetic oscillator ([12], Ch. 2) are very similar to that of the considered 
logistic map ([14], Ch. 3). 

The bifurcations are often treated as the chaos precursors. Hence, any methods for 
their detection are of a great practical interest (see, e.g. [12]). 

3 Random processes 

3.1 Sequences of random independent variables 

For the case of random processes the difference analysis has richer consequences and 
permits theoretical study. We consider discrete random processes of the form 

e = (6,6,---,en,---) (22) 

coordinates ^„ of which take binary values and 1 with some positive probabilities, 

P{L = 1) = ?n {Pn + ?n = 1) • 

in,k>l, e^^Cn) 
also take binary values with some positive probabilities 

Pie^ = 0) = Pie^ = 1) = gi^) (p(f ) + qlt^ = 1) . 



Then the differences ^^^^^ , 

t{k) ^ \ Ak-l) _ 
S>n IS.n+1 



Hence, one can consider the random difference processes 

(23) 

we will also deal with the corresponding random variables of the form 
(the notation is the same as for sequences (23)). It is easy to see that 
where TZ^'^^ is k-ih iterate of the " fractal map" TZ from [2] : 

7^ e = 0. 6 ©6 6 © 6 • • -a © U+i ■ ■ ■ 

(we use notation a (B (3 {= \a — f3\) for logical sum of binary variables a and j3). 

We are interested in the limiting behavior of these differences when k goes to infinity. 
We say that ^^'^^ converges to a random sequence ^^'^^ if the p^^ tend to some numbers 
as A; — > oo and k E A (convergence by probability) ; here A is a given infinite subset 
of natural series and the final probabilities may depend on A, p!^^ — p^°°)(A). Then 

r = eT^ = = e^HA) 

is again a discrete random process, binary components of which take values and 1 with 
some final (or stationary - if follow the terminology of Markov processes) probabilities, 

= 0) = p<^\ P(ei~) = 1) = (pi~) + = 1) . 

The following 3 statements are the basic tools we apply for studying the limiting 
behavior of differences of binary random sequences. For G {0, 1} we use the notation 

k 

< eo, ei, . . . , efc >= (Y^ eiCl) mod (2) . 

1=0 

Lemma 1 For probabilities of k-th (k>0) differences we have 

nef = A)= E Pi^n = eo)Pi^n+i = ei)---Pi^n+k = ek) (24) 

<€o,ei...€fc>=A 

where n > 1 and A G {0, 1} are arbitrary. 

It is convenient to represent the probabilities in the form 

Pi^n = A) = i(l + {-l)\n), P(ef = A) = ^(1 - (-l)M'^) 

where — 1 < 7r„, tt^'^) < 1 are some numbers. The proof of Lemma 2 is based on the 
following remark: 



o 



Remark 1 The identity 



<eo,ei,...,efc>=A ^ 0<i<fe, ai_fc=l i=0 



The following lemma gives an explicit expression of tt^'^^ by means of 7r„. To formulate 
it, we consider the binary analogy P = (Q;j,fc)j=o,n; fc=o,oo of Pascal triangle of binomial 
coefficients, is defined as follows: q;o,A; = (^k,k = 1 and 

{0, CI is even 
1 Cl is odd ^^'^ ^ ® ■ 

for 1 < i < k — 1. The fractal graphical image of the triangle P can be found in [15], 
where it is considered in connection of cellular automata theory. 

Lemma 2 For arbitrary n,k>l the equality 

4'^- n ^n+i (25) 

0<i<k, ai^k=l 

is true. 

We are interested in the existence of limit of the n^'' when /c — > oo. When k converges 
to infinity arbitrarily , this limit may not exist. For instance, in the simplest case pn = p 
(or TTn = tt), it follows from Remark 2 below, that if the binary code of number k contains 
exactly m units, then tt^'^ — tt^"". Since for every given s the collection Aj„ of all such 
numbers k is infinite, it is clear that the limit mentioned, generally speaking, does not 
exist. On the other hand, arbitrary given 7r„ the Lemma 2 in principle allows to describe 
all of the infinite subsets A C for which the limit 

7^(00) ^ (oo)/^N lim -K^^^ 

k^oo, kGA 

exists. In other words. Lemma 2 provides a sufficient tool to describe all of the A for which 
the limiting random sequence ^^°°^(A) exists: it follows from (25) that for any infinite A 
and n > 1 the final probabilities can be computed by formula 

In ^4 = lim Y In r^-^ (26) 

provided the right hand limit exists (or is infinite). 

The most of further results relate to studying these final processes for some particular 
case: we consider the limiting transition of the differences ^^''^ as /c — > oo and k E Aq 
where 

Ao = {A; = 2^-l:p = l,2,...} . 

Then for arbitrary k we have all of the ai^k = 1 and hence the relation (25) gains a simpler 
form 

4'^ = n^n+.. (27) 
1=0 

Prom where immediately follows: 
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Theorem 3 The limiting process ^(°°)(Ao) is the symmetric random walk, i.e. nl°°^ = 0, 
if and only if when 

oo 2 

^ In I — I = oo . (28) 



n=l 



In the contrary case, when this series is convergent, we have 



ji-i 



Y[\Tii\ ^ (= |7rn||7r„+i||7r„+2| • • •) ■ (29) 



It is clear that if j9„ = const, then (28) holds. If 7r„ = 2^" (^ > 1), we have an example 
of a self-conjugate system: = ^. For Poisson distribution, whenp„ = e~'^A"/n!, using 
(27) it can be easily computed 

- Pnl = o (Pn) (n ^ oo) . 

The same relation is valid also for Poisson homogeneous events flow with probabihties 
Pn{t) — e~^*{\tY /n\. Indeed, (following [16], Ch. 6.5) for a given t we divide time 
interval [0, t] into equal intervals of lengths 1/n and on the obtained finite lattice consider 
Bernoulli's trial scheme with success probability is equal to \t/n. Since the success 
probability in s trials tends (as n — > oo) to the function Pn{t), we obtain 

\Pi^\t) - Pn{t)\ = o {Pn{t)) (n^oc). (30) 

The next statement is an immediate consequence of Theorem 3. 

Corollary 2 The random sequence fj = (771,772,773,...) is a limiting (for some C^, i.e. 
V = ^Mi ^ ) process, if and only if when either fj is the symmetric random walk or the 
sequence \7in{f])\ monotone increases to 1 as n ^ 00. 



Corollary 3 // 77 is a limiting random sequence (for some ^, i.e. r] — ), differing 
from symmetric random walk, then the random variable 

0.771 ^2?73 ■ ■ ■ 

has a pure singular probability distribution. 

The latter statement follows from Marsaglia's results [17] - Marsaglia's criterion a 
random variable fj possesses a singular distribution, is 

00 

Yl kn(77)r = 00 . 
n=0 

The Corollary 2 provides a stronger condition: |7r„(f/)| — > 1. Since we have 

1 - k„| = 2min(p„,g„) , 
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the convergence of series in (28) is equivalent to condition 



^ min {pn, Qn) < oo 

n=l 

According to [17], this is also equivalent to a requirement the random variable fj from 
Corollary 3 has a discrete probability density function. 

If TTn = const, then according to Theorem 3 the ^^^^ is the symmetric random walk. 
It can be shown that for this case Theorem 3 remains valid when the limiting transition 
is taken over some "wide" subsets A of natural scries: 

Theorem 4 //vr^ = const, then there exists a set A G N with density 1 in natural series, 
such that ^j^^ is the symmetric random walk. 

Here, as such a A it can be chosen a sequence of natural numbers, for which the total 
number of units in their binary codes increases to infinity quickly enough. Theorem 4 
follows from the next proposition: 

Remark 2 (1) The total number of units in k-th line of binary Pascal triangle is equal 
to 2"^^''^ where m{k) is the total number of units in binary code of number k. (2) There 
exist sets A G N such that limfc^oo, fceA^(^) = '^"'^ dens{A) = 1. 

For a given ^ we let 

7(e,^) = ef^+ef + (31) 

- the total number of units in (the realizations of) k-th difference sequence (^j'"'', . . . , Cfe-i)j 
it also coincides with the total number of changes of the binary symbol in the sequence 

7(e, k) = lef - Ct'^ ! + ••• + \^t,'^ - Ct'^ I . (32) 

The next two statements follow from Theorems 3, 4 and Remark 2 and relate to 7- 
characteristic of random sequences: as in (9), given infinite A <Z N, we define 

7(e,A)= lirn^^^ (33) 
K— >oo, fceA /c 

where the existence of the limit is assumed. We note again that the limit in (33) with 
A = , generally speaking may not exist and then one has to study the corresponding 
non-trivial cluster sets of the ratio in (33). The same situation holds also for Lyapunov 
exponent, is defined by Eq. (10). 

We consider some particular case of processes ^ from (22), requiring 

P{^n = 0) > P{Cn = 1) (34) 

or what is the same, 7r„ > for all n > 0. 
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Theorem 5 Let us have a random process { = {in)'^=i satisfying (34)- If 



and if 



y~] P{Cn = 1) < oo then with probability 1 lim 

fe— ►oo, fceAo k 



°° 7(£ k) 1 

P{^n = 1) = oo then with probability 1 lim — — = - 



CorollEiry 4 //7r„ = tt ^/len there exists k C A?" suc/i that dens{A) — 1 anc?7(^,A) = 1/2. 

3.2 Conjugate attractors of identically distributed sequences 

On considering the random sequences (22), one may assume ([16], Ch. 8.6 and [22], 
Ch. 4.3) that 

= Cn(<^) = whcrc ou = 0.a;ia;2a;3 ■ ■ ■ ; (35) 

here cu is real number from unit interval, given in form of its binary expansion. We are 
interested in Hausdorff dimension of the sets 

1 " 

M = M{p) = {a; e [0, 1] : lim - ^ = p} (36) 

„^oo n 

where is assumed P{in = ^) = P — const. If p = 1/2, then M is the set of so-called Borel 
normal numbers ([7]) for which mes{M) — 1 and dim{M) —1 {mes is Lebesgue measure 
on [0, 1]). Hence, if in (36) p ^ 1/2, then mes(M) = 0. However, in that case there exists 
a singular measure u on interval [0,1] such that supp{v) = M, z/(M) = 1 ([7], Ch. 4). 
The Eggleston theorem provides us with explicit expression for Hausdorff dimension: 

dim{M{p)) = Hip) (37) 

where H is Shannon function (14). The derivative of distribution function of measure 
i.e. the probability density of the random variable 

O.Ci(u;)6(^)6(^)--- 

is a singular function, concentrated on a set of zero Lebesgue measure. The self-affine 
graph of such a function can be found in [7], Ch 3.1. 

For a given stochastic system it is interesting to describe all of the attractors Am 
(we assume they are numbered by increase of their Hausdorff dimension), mentioned in 
section 2.1. It can be easily done for the case 7r„ = const {— tt). Indeed, if we let 

A^ = {A;e A^:A; = 2*' + m, p = 0,l,2,...} (38) 

then according to Remark 2 for the limiting process we have 7r^°°-'(A^) = tt^"". For 
the terms of conjugate orbit v we have 

= .... 
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From where it is clear tiiat tlie conjugate attractor Am corresponding to random variable 
^(oo) _ coincides with the set, on which the sequence 

F(oo) _ Q ^(00)^(00)^(00) 

^.^^T^ ^1 ^2 ^3 

is locahzed (with probability 1). Then Eggleston theorem imphes 

Theorem 6 For the sequence of identically distributed random variables, 7r„ = tt, the set 
of all its conjugate attractors is a collection of some Eggleston sets, 

Am^M{^^) (m = 0,1,2,...) . (39) 

If we let 

H{x) = H{^-^) {H{x) = H{-x), 0<H{x)<l), -1 < x < 1 

then Theorem 6 gives dim{Am) = ^{71^"^), hence H~^{dim{Am)) = tt^"" and thus the 
next statement is true: 

Theorem 7 For Hausdorff dimensions of conjugate attractors Am of identically dis- 
tributed random sequence the equalities 

{H-\dim{Ai)f'^ = {H-\dim{A2)f'^ = • • • = {H-\dim{Am)f'^"^ = • • • (40) 

hold. 

3.3 Binary Markov processes 

Here we give analogies of some results from Sec. 3.1 for the case of infinite binary Markov 
chains. 

Remark 3 Let ^ = {C,n)'^=i be a binary Markov chain with the transaction probabilities 
7Tn{x,y) = P{Cn = l/|Cn-i = x) and with the probabilities Pn{x,y) = P(C„ = x) of attain- 
ment the value x for n steps by the finite chain {^k)^=i. Then for every k > 1 the difference 
sequence = {^n^)'^=i is also a Markov chain and for the corresponding probabilities 
7r^\x,y) and p^\x) we have the following recurrent relationships: 

£\x)=pi%{0)ni'\0,x) +pi%{l)ni^\l,x) (41) 

y) = ^ti\0, x)7:t'\x, \x-y\)+ 71^(1, 1 - x)7r^^)(l -x,l-\x- y\) (42) 

For the general Markov processes, the computation of distributions 7r^°°^ and p^°°^ for 
the limiting processes can be complicated. However, we are able to compute the 
distribution p^'^'' for homogenous processes and for arbitrary given A = = {2^ + s : 
p = 0, 1, 2, . . .}. This is based on the next two statements. 
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Lemma 3 If ^ = (Cn)^i is a homogenous Markov process and A = {2^ : p >0} then for 
the difference processes 

^(fe) ^ {^(j'))^^^, keA, we have 

oo 

p&-^)-q' E ((t^t;)^^'"' E Ec^-7'^p'-W (43) 

e,Se{Q,l} ^ \p/2]=\k=l 

where A e {0, 1}, x = ^, y = ^-y-^-^, s = p{l, 1), q — p{0, 0) andp{x, y) is the transaction 
probability function for the process ^. 



Remark 4 The next identity 

E E ClCi_^x^y^ = (1 + ^TTr^G^-J) (44) 

where Tm{z) — J2k=o^m-k^'' Chebyshev polynomial, is true. 
These three statements imply: 

Theorem 8 If ^ — {Cn)^=i 'is a homogenous Markov process and A — {2^ : p > 0}, then 
we have 

to P(e'=A)^iA- ,f 7";-"' I (45) 

where A, s, and q are the same as in Lemma 3. 

It is clear from (42) that if ^ is homogenous process then for every A; > 1 the difference 
process is again homogenous. It is clear now that the limits of the quantities P{^^^ = A) 
can be computed for all the sets of indices of the form = 2^+s : p = 0, 1, 2 . . .}. Indeed, 
it is not difficult to see from Theorem 8, that this limit is 

Hm P(e) = A) = |A-4^^^M;^| (46) 
keAs,k^c. 4^^1,0) + 4'''\0,1) 

It is also clear that the quantities ttq*^ easily determined recurrently according to 
(41) and (42) (and hence their explicit computation is also possible). Thus we have the 
convergence of the quantities P{^1^'' = A) along the A^, however we cannot confirm the 
convergence for the probabilities Til^\x,y). If their convergence is known apriori, the 
relation (41) implies 

4°°nO,l) _M l 7rW(0,l) + 7rW(l,0) 

7r^^(l,0) ' 2 4°Hl,0)7rf (0,1) ' ^ ' 

- this ratio can measure the 'strength' of Markov dependence property for the final process. 
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3.4 One-dimensional diffusion 



On obtaining the relations (30) for Poisson events flow, we used the circumstance that 
this process can be approximated by some discrete random processes. By the same way, 
one can obtain such a result for the diffusion processes. However, in this case we need to 
consider some "averaged" differences are defined as 

^(ef = 0) = ^(l-^f) where tt^ = (7r('=))V'= = (,r„7r„+i . . . 7r„+,) 

We will also need to use the (C, 1) summation by Cesaro [18]: 

(C, 1) / fds = lim - / fds . 

Jx z-*-+oo z Jx 

Indeed, let the diffusion process ^ (t) is determined by Fokker- Plank- Kolmogorov equation 
([16, 19, 20]) 

df{x,t) ^ , df{x,t) a d'^f{x,t) 

dt dx 2 dx^ ^ ' 

where a = a{x) and b = b{x) are some coefficients and f{x, t) is the probability density of 
diffusing particle. It is well known (see e.g., [16], Ch. 5.4 and [19]) that such processes can 
be obtained as a result of some limiting transition from random walk on one-dimensional 
lattice: we consider the lattice Lh with a step h > and assume that a particle changes 
its position on the discrete time moments are proportional to some small r > and with 
the probabilities 

X X 1 ^ 

Imposing the restriction h'^/r a (it can be assumed a = 1) and based on Central Limit 
Theorem, one can deduce the FPK-equation as well as to obtain its solution. 

The scheme we apply to study the difference structure of such continuous processes is 
the following. If we have a discrete motion on the lattice Lh, the transition probabilities 
of which permit an equality of the form 

TTn = Cnh (50) 

then, after calculations by formula (27), we transform the quantities ttH^^ for probabilities 
of k-th difference process taken from random walk to a form 

where the coefficients Cx,k are such that there exists the limfc^ooCx.fc = Cx- In such a 
way we can consider a random walk, corresponding to the discrctizcd difference process, 
assigned on the same lattice Lh and with the same time scale r. For the considering case 
(49) we have Tf^^^ — T^n — bh/2, i.e. = ^h- From where we conclude that ^^°°\t) 
coincides with ^ (t) . One can see, the same arguments lead to the following formula: 

\B{x)\ =exp(- (C,l) / ln\b{z)\dz) . (51) 

Jx 

By such a way, the next statement is true: 
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Theorem 9 The limiting averaged differential process i^°°\t), taken from the diffusion 
$,{t) is again a diffusion process. If ^ satisfies equation (48) then the drift coefficient B 
for can be computed by (51). 

The same analysis is also applicable to general birth and death processes [16] as well 
as to abstract diffusion processes, considering in non-standard stochastic analysis [30] 
including the Ising model. We note another possible applications of the above given 
approach. It is queuing analysis in computer networks (see e.g., [20, 21]). This theory 
deals with randomly arriving demands to some processors. The random process of time 
intervals between arrival moments is studied. On investigating of so-called heavy traffic 
[20, 21], a great importance has the diffusion approximation - the process of queue lengths 
can be approximated by diffusion process (48) (cp. Wiener's neuron from next section). 
This area of applications is as large, that requires an independent study. 

4 Neural networks 

4.1 Stochastic neuron models 

Wiener's model of neuron ([8], p. 299) has been suggested by Mandelbrot and Gerstein 
(see [23, 40]). On the studying the actual neuron spike trains, they found that for some 
instances these trains are well approximated by (bounded) diffusion process. The idea 
is that the excitatory and inhibitory signals, arriving to neuron's input, can be math- 
ematically interpreted as a (bounded) random walk on one-dimensional lattice with a 
small constant step h. The limiting (as /i — > 0) process ^ is described by Ito's stochastic 
differential equation ([19], Ch. 5.4) 

d^{t) = ix{x, t)dt + (t(x, t)dW{t) (52) 

where W is Wiener's process. These authors have found that for some appropriately 
chosen values of parameters and cr, the process ^{t) fits well the experimental data on 
neural activity. The equation (52) is equivalent to diffusion equation (48) with /j, = b and 
(7^ = a ([8], p.299). 

It is noted ([8], p.299) that inclusion a noisy component to a given deterministic system 
may cause the diffusion process in the system. When the noise is of a small intensity, the 
system shows the Poisson behavior. By this reason, in some cases the neural activity can 
be described by Poisson driven ([23, 40] and [8], p.299) stochastic models. 

The next proposition is the main statement of this subsection. It is simply a reformu- 
lation of some results (Theorem 9 and Eq. (30)) from Section 3. 

Corollary 5 The system, conjugate (in "average" sense) to Wiener neuron is again a 
Wiener neuron. The system, conjugate to Poisson driven neuron is again an (asymptot- 
ically) Poisson driven neuron. 

Applying the approach described in Sec. 3.4, the analogous result might also be stated 
for Ornstein-Uhlenbeck neuron ([8], p.299), that is defined as a diffusion process with time 
dependent coefficients a and b (or /i and cr) of some special form. 
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4.2 New type of memory in neural networks 

We consider the update equation ([8], pp.119, 230, 930), that describes the dynamics of 
neural network consisting of n McCulloch-Pitts neurons 

Xk{t + 1) = cr{hk{t) - 9k) where hk{t) = ^ Wk,jXj{t) (53) 

Here w^j are synaptic strengths, 6^ are threshold constants, a is activation function, 
is synaptic potential, variable stands for k-th neuron binary states, and t designates 
the discrete time. Different choices of cr, so-called sigmoid functions, are possible. It is 
accepted to include the probabilistic "noisy" term to this equation ([8], p. 930), 

P{xk{t + l)^5)^^{l + {-iy7rk) ; (54) 

here, it can be chosen, e.g. ([8], p. 902) 

TTfe = tanh[T~^ hk{t)] 

where the variable T > (temperature) reflects the level of noise. 

The main two problems, investigating in neural networks are retrieval and learning 
problems. The first one studies the dynamics of neural states Xk provided the connections 
Wij are time independent and fixed. The most interest consists in revealing the attractors 
of dynamics (54), which are considered as the memory storage of given network. The 
basic result is the Hopfield theorem, establishing that under some restrictions on matrix 
W the conflguration point 

X = X{t) = {Xi{t), X2it),...,Xnit),...) (55) 

converges to some fixed-point attractors. The Hopfield nets consist of spin neurons, taking 
values ±1. It is proposed [28] the existence of some 'energy' function ([8], p. 363) associated 
with W 

E ^ E{W) = ~J2^,,s,s, (56) 

which, provided certain restrictions, permits the Lyapunov function ([8], pp.363, 230): the 
value of E is increased with any update of spins. The local minima points of the function E 
are treated as the attractor memory states. This memory can take about 20% of the whole 
configuration space [28]. Some other works in neural networks ([8], p. 258), generalizing 
the Hopfield's approach, introduce and study the networks of chaotic elements. The aim 
is to provide the existence of an hierarchical memory storage of coexisting attractors. 

The analysis from previous sections reveals a new type of attractors of the dynamics 
(54) and therefore, a new type of memory in neural networks. Indeed, for arbitrary k-th 
neuron we consider its states 

= {Xk{^): Xk{2),...,Xk{t),...) 

deflned by (55), as a discrete random process. Despite of we were mostly dealt with 
independent random variables, the results of Section 3.3 show that the difference analysis 
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is also applicable to the case of binary random Markov processes - an important restriction 
(see e.g., [8, 27, 28]), usually imposing on the process (53) (or (54)) of the brain states. The 
relation (26) in principle allows to compute all of the final processes ^j^\A), corresponding 
to those A C N, for which the final probabihties 7r^°°\A) exist. The discrete processes 

(er^(Ai), ef^(A2),...eHAn),...) (57) 

can be treated as some final (or stationary) processes of the network dynamics, given by 
Eqs. (54) and (55). The Cartesian products 

4i^x4^)x...x4^)x--- (58) 

(1 < < oo) where A^^^ is an attractor for the conjugate orbit, on which the random 
variable ^j^\An) is localized, are the attractors for configuration point (55) and therefore 
can be treated as a new type of memory of a given neural network. 

The learning or task-adapted problems, considering in neural networks theory, have an 
inverse statement: given set of patterns of configuration points to determine the matrix 
W for which the fixed point attractors coincide with this set of patterns. It is supposed 
that some finite number of given patterns to be learned by network, as well as their 
components, are random and have identically distributed components ([8], p. 651). In the 
framework of the analysis presented, the learning problem can be stated as follows: given 
random processes of the form 

r] = {m,V2,---,Vn,---) (59) 

to determine the matrix W of adjustable connections in such a way that these fj coincide 
with some final processes (58) with some Aj C N. 

4.3 Elimination of noise 

The real mimbers are sometimes represented in the "pulse density system" ([31]) - e.g., 
the limit in the relation (36) represents the number p in this system. The density of 
finite number of alternating signals is easily expressed through the densities of its con- 
stituents. In this system, all the basic operations with signals are also possible ([31]). 
This representation is also used on the studying the multidimensional stochastic systems 
(e.g., on investigating the baker's transformation [33]). The difference attractor A for 
such complex system refiects the synthetic information on all the constituents. In respect 
of information alteration, due to the timing of the signals, see also [8], p. 693. 
We have (see Eqs. (31) and (33)) 

and therefore, the 7 is a type of density. It is mentioned in [31] a device (the charge 
capacitor), transforming the pulse density to some analog quantity. This remark in fact 
indicates a way to set a correspondence between the 7-coefficient of neural activity and 
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neuron's electrical characteristics. Hence, the 7 can be treated as some mixed analog- 
digital characteristic of neural activity - according to von Neumann [31], the neuron 
enables to combine the analog and digital features in its activity. 

We use these remarks to deduce a new type of neuron firing condition, based on 
analysis from Sec. 3 and formulated in probabilistic terms. Accepted condition for neuron 
firing says that the total electrical charge in neuron should exceed some threshold level. 
In the formal neural networks, it is reflected in the update equation (53) - a weighted 
sum of input signals should exceeds some threshold level. On the other hand, the work 
[31] claims that the actual firing conditions may have a very different form. The next 
suggestion is derived from consideration of discrete random processes. One can see, it has 
the theoretical-probabilistic character and does not refer to any neural context. 

Let us have a neuron, receiving on its input the signals from other neurons. As in the 
previous section, we assume that these signals are some independent random variables 
r]k, taking the values +1 (excitatory signal) and —1 (inhibitory signal). We assume also 
that we deal with discrete-time process and that the r]k are ordered by the growth of their 
arrivals time. Letting Cfe = (1 + %)/2, we have on the input of neuron a random binary 
process ^. Such processes have been considered in Sec. 3 and therefore all the results of 
this section can be applied. We are interested in the statement of Theorem 5, which we 
now reformulate as follows: 

Corollary 6 Let ^ ~ (Cn)^i be a random binary sequence with independent terms ^n, 
satisfying (34). If 

00 

^ P(^n = 1) < 00 then with probability 1 H{'y {^)) = , 

n=0 

and if 

00 

P{Cn = 1) = 00 then with probability 1 H{'y (^)) = 1 . 

n=0 

Let us now have infinite number of some binary random variables (neurons) ^k{t) with 
given distribution of probabihties 

P{Ut)^0)^Pk{t), P(a(i) = 1) = gfe(i) {Pk{t) + qk{t)^l) . (60) 

Here /c > 1 is neuron's number and t = 0,1,2,... is the discrete time. If we understood 
the realization of the random variable ^fe(i) as 




1 k-th neuron is active (fired) at moment t 
k-th neuron is inactive (silent) at moment t , 



it can be said that every probabilistic neural net is a random process Ap of the form 

m-m),ut),---,ut),---) (61) 

Now we introduce a deterministic network Afu, associated with the process Afp. Given 2 
infinite matrices: the stochastic matrix P = {pkit))'k^t=o probabilities (60), and the 

binary matrix of the connections W — {wij)^j^Q: 

{1 i-th neuron affects on j-th neuron 
i-th neuron does not affect on j-th neuron 
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we assign the evolution equation of tiie net Md as follows: It is clear that the process 



consisting of all the neurons, affecting on A;-th at the moment t, is the input process for 
k-ih. neuron. According to Corollary 6 the quantity 



(r = if 07 is the composition of H and 7) is either or 1. We impose the following firing 
condition for the neurons of Hd'- 



Let us explain these definitions. The equation (62) is the evolution equation of the 
process x{t): given ^(t) it allows to compute the x{t + 1). This means we have required 
the 7-characteristic (and hence the entropy H^j)) to be the basic quantity, determining 
the dynamics of the deterministic network Afjj: in order to determine its state at next 
time step (i.e. x(t + 1)), each neuron of Afp computes the 7-characteristic and then the 
entropy if (7) of its present input (the ^{k;t)). 

It follows from Corollary 6, that the dynamics of the net A/d can be represented in 
the form 



In other words, we have proved the following 

Theorem 10 For every neural network Afp, consisting of probabilistic neurons — Ck{t) 
there exists some deterministic neural network Hd which at each time t computes the 7- 
characteristic of the inputs of ^k- 

The probabilistic "noisy" networks has been introduced by W. Little [27] in order 
to include to the theoretical studies the noise actually presenting in the brain. The 
Theorem 10 shows, that when wc arc interested in the entropy aspects of neural activity 
and if we follow the difference approach, the noise can be eliminated. In this respect, the 
latter theorem can also be treated as some analogy of de Leeuw-Moore-Shannon-Shapiro 
statements [39] from the automata theory for the case of neural networks. Note also that 
Theorem 10 makes the modified neural networks remarkably closer to actual brain: the 
brain enables to operate avoiding the influence of external noise. 



T{i{k-t)) = H{^{i{k-t))) 



Xk{t + l)=V{i{k-t)) . 



(62) 



Xk{t + 1) = A{Sk{t)) where Sk{t) = ^ Wk,jXj{t)pj{t) 
(cp. Eq. (53)) where A is the impulse function of the form 



(63) 




on 
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